Three Layers — Tool, MCP, Skill

Agent systems need three different kinds of capability surfaces: atomic actions, externally delivered primitives, and workflow instructions. In Codex terms, MCP servers can expose tools, resources, and prompts; skills package reusable workflow guidance; and plugins can bundle skills, MCP servers, apps, and assets for distribution (MCP specification, Codex skills docs, Codex plugins docs). The useful mental model is not "which one is better?" It is "which layer should own this kind of change?"

The Stack

Layer	What it owns	Change cost	Best use
Tool call	One atomic action with a schema	High if built into the agent, medium if external	Deterministic work such as search, file IO, database reads, API calls
MCP primitives	Tools, resources, or prompts exposed by an external MCP server	Medium	Shared integrations, team-owned services, third-party capabilities
Skill	Written workflow instructions the model reads	Low	Multi-step procedures, conventions, reviews, writing workflows
Plugin	Packaging and distribution	Medium	Bundling skills, MCP servers, and apps into an installable unit

I use "tool call" for the generic agent-systems abstraction. OpenAI's function-calling guide describes application-defined function tools with schemas, while Codex docs define MCP servers, skills, and plugins as separate extension surfaces (OpenAI function calling guide, Codex glossary).

Tool Call

A tool call is a typed action. The model supplies arguments, the runtime executes code, and the result returns as structured data. The value comes from rigidity: the action has a schema, a name, a permission boundary, and a predictable side effect.

That rigidity is why tool calls should own atomic work. If the agent needs to query a database, send an email, read a file, or call an API, code should do it. The model can decide when the action is relevant, but the action itself should not be improvised in prose.

The failure mode is overloading. If you encode a whole workflow into one tool, every process change becomes a code change. The tool becomes hard to test, hard to review, and hard to reuse.

MCP

The Model Context Protocol gives agents a standard way to connect to external servers that provide tools, resources, and prompts. The official architecture uses a host, an MCP client, and an MCP server; the data layer uses JSON-RPC 2.0, while the transport can run over stdio or Streamable HTTP (MCP architecture overview, MCP specification).

Architecturally, an MCP tool is still a tool. It has a callable shape and returns data. MCP resources are URI-identified context objects, and MCP prompts are reusable templates served through the protocol (MCP tools spec, MCP resources spec). The difference is ownership. The implementation lives outside the agent core, often in a separate process or service. That lets a database team, internal platform team, or third-party package maintain the capability without changing the agent itself.

The failure mode is treating MCP as a reasoning layer. MCP does not decide how to solve the task. It transports capabilities and context. The model still has to choose the right call, and skills or system instructions still have to guide multi-step behavior.

Skill

Codex docs define skills as reusable workflow packages that include instructions and can include supporting scripts or references. Codex uses progressive disclosure: the model initially sees skill names and descriptions, then loads the full SKILL.md when the task calls for it (Codex skills docs, Codex glossary).

A skill is interpreted text. It does not expose a callable API by itself. It tells the model how to coordinate existing capabilities: which tools to use, what order to follow, what constraints matter, and what checks should happen before delivery. MCP prompts can also package reusable prompt templates, but they live in the protocol layer; Codex skills are local workflow packages with instructions, optional scripts, and references.

That makes skills good for work with judgment. Article rewriting, code review, incident triage, document rendering, and research workflows all have steps, exceptions, and taste. Encoding that into a markdown workflow gives you a fast way to update the process without rewriting the tools underneath it.

Plugin

Plugin is the packaging layer. Codex docs describe plugins as installable bundles that can include skills, MCP servers, apps, and other assets (Codex plugins docs, Build plugins docs). That makes plugins orthogonal to the tool-versus-skill distinction.

A plugin might ship an MCP server for a database, a skill that explains the review workflow for that database, and an app surface for humans to inspect results. The plugin does not replace tools or skills. It distributes them.

Boundary Rules

The clean boundary is simple:

Question	Put it in
Does it perform one deterministic action?	Tool call
Does it expose external tools, resources, or prompt templates over a protocol?	MCP
Does it describe how to run a workflow?	Skill
Does it bundle capabilities for installation?	Plugin

If a skill says "query the database," an actual tool must exist. If a tool tries to "handle onboarding well," it is probably hiding a workflow that belongs in a skill. If an MCP server contains broad business process logic, it may become a remote monolith instead of a capability provider.

Valid Combinations

You can build with any subset. A small internal agent may only need built-in tool calls. A writing or review assistant may need tools plus skills. A company-wide system with many integrations usually needs MCP servers so teams can own their services independently. A distributed product needs plugins to package and install the pieces.

The mature pattern is stable actions at the bottom and flexible instructions at the top. Keep tools narrow. Keep MCP servers accountable to external capability ownership. Keep skills readable enough that a human can audit the workflow. Package with plugins when installation and reuse become the problem.

My Take

Most agent architecture mistakes come from putting change in the wrong layer. Teams put process logic into code and then wonder why the agent is expensive to update. Or they put deterministic actions into instructions and then wonder why the model behaves inconsistently. The stack works when each layer absorbs the kind of change it was built for.

Takeaways

Tool calls give an agent reliable actions. MCP delivers external tools, resources, and prompt templates under a standard protocol. Skills tell the model how to coordinate actions across a workflow. Plugins package the pieces for installation and reuse. The architecture stays healthy when deterministic behavior lives in code, shared integrations live behind MCP, and evolving process knowledge lives in skills.

References

author: Arii tag: #ai links: [[Jailbreaking LLMs]], [[Small LLMs — Use Cases and Limits]], [[Multi-Token Prediction]]